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Abstract 

In this paper, we show the equivalence of the set of unitaries computable by the circuits over 
the Clifford and T library and the set of unitaries over the ring Z[^=,i], in the single-qubit case. 
We report an efficient synthesis algorithm, with an exact optimality guarantee on the number of 
Hadamard and T gates used. We conjecture that the equivalence of the sets of unitaries imple- 
mentable by circuits over the Clifford and T library and unitaries over the ring ZM^i] holds in 
the n-qubit case. 



1 Introduction 

The problem of efficient approximation of an arbitrary unitary using a finite gate set is important 
in quantum computation. In particular, fault tolerance methods impose limitations on the set of 
elementary gates that may be used on the logical (as opposed to physical) level. One of the most 

common of such sets consists of Clifford 1 and T:= ( ^ ^ 4 ^ gates. This gate library is known to 

be approximately universal in the sense of the existence of an efficient approximation of the unitaries by 
circuits over it. In the single-qubit case, the standard solution to the problem of unitary approximation 
by circuits over a gate library is given by the Solovay-Kitaev algorithm [1]. The multiple qubit case 
may be handled via employing results from [2] that show how to decompose any n-qubit unitary into a 
circuit with CNOT and single-qubit gates. Given precision e, the Solovay-Kitaev algorithm produces 

a sequence of gates of length O (log c (1/e)) and requires time O (log d (l/e)) , for positive constants c 
and d. 



^^Also known as stabilizer gates/library. In the single-qubit case the Clifford library consists of, e.g., Hadamard and 
Phase gates. In the multiple qubit case, the two-qubit CNOT gate is also included in the Clifford library. 



While the Solovay-Kitaev algorithm provides a provably efficient approximation, it does not guarantee 
finding an exact decomposition of the unitary into a circuit if there is one, nor does it answer the 
question of whether an exact implementation exists. We refer to these as the problems of exact synthesis. 
Studying the problems related to exact synthesis is the focus of our paper. In particular, we study the 
relation between single-qubit unitaries and circuits composed with Clifford and T gates. We answer 
two main questions: first, given a unitary how to efficiently decide if it can be synthesized exactly, 
and second, how to find an efficient gate sequence that implements a given single-qubit unitary exactly 
(limited to the scenario when such an implementation exists, which we know from answering the first 
of the two questions). We further provide some intuition about the multiple qubit case. 

Our motivation for this study is rooted in the observation that the implementations of quantum algo- 
rithms exhibit errors from multiple sources, including (1) algorithmic errors resulting from the math- 
ematical probability of measuring a correct answer being less than one for many quantum algorithms 
[3], (2) errors due to decoherence [3], (3) systematic errors and imperfections in controlling apparatus 
(e.g., [4]), and (4) errors arising from the inability to implement a desired transformation exactly using 
the available finite gate set requiring one to resort to approximations. Minimizing the effect of errors 
has direct implications on the resources needed to implement an algorithm and sometimes determines 
the very ability to implement a quantum algorithm and demonstrate it experimentally on available 
hardware of a specific size. We set out to study the fourth type of error, rule those out whenever pos- 
sible, and identify situations when such approximation errors cannot be avoided. During the course of 
this study we have also identified that we can prove certain tight and constructive upper bounds on the 
circuit size for those unitaries that may be implemented exactly. In particular, we report a single-qubit 
circuit synthesis algorithm that guarantees optimality of both Hadamard and T gate counts. 

The remainder of the paper is organized as follows. In the next section, we summarize and discuss our 
main results. Follow up sections contain necessary proofs. In Section 2, we reduce the problem of single- 
qubit unitary synthesis to the problem of state preparation. In Section 3, we discuss two major technical 
Lemmas required to prove our main result summarized in Theorem 1. We also present an algorithm 

for efficient decomposition of single-qubit unitaries in terms of Hadamard, H:= ^= f j \ ) ' ^ 

gates. Section 5 and Appendix A flesh out formal proofs of minor technical results used in Section 4. 
Appendix B contains a proof showing that the number of Hadamard and T gates in the circuits produced 
by Algorithm 1 is minimal. 



2 Formulation and discussion of the results 

One of our two main results reported in this paper is the following theorem: 

Theorem 1. The set of '2x2 unitaries over the ring , is equivalent to the set of those unitaries 
implementable exactly as single-qubit circuits constructed using 2 H and T gates only. 

The inclusion of the set of unitaries implementable exactly via circuits employing H and T gates into 
the set of 2 x 2 unitaries over the ring Z[^,i] is straightforward, since, indeed, all four elements of 

each of the unitary matrices H and T belong to the ring Z[^,i], and circuit composition is equivalent 
to matrix multiplication in the unitary matrix formalism. Since both operations used in the standard 
definition of matrix multiplication, "+" and "x", applied to the ring elements, clearly do not take 
us outside the ring, each circuit constructed using H and T gates computes a matrix whose elements 
belong to the ring Z[^,i]. The inverse inclusion is more difficult to prove. The proof is discussed in 
Sections 3-5 and Appendix A. 

2 Note, that gate H may be replaced with all Clifford group gates without change to the meaning, though may help 
to visually bridge this formulation with the formulation of the follow-up general conjecture. 



2 



— 1 




3- 


T 




| — H T H — | 


1- 


Tt 


^-Tt-£ 




•— 


< 


H-e-J 


i— 


Tt 


— e-Tt_( 


3 ( 


3~ 


T 


-®-T , 




B- 

H - 



Figure 1: Circuit implementing the controlled-T gate, with upper qubit being the control, middle 
qubit being the target, and bottom qubit being the ancilla. Reprinted from [5]. 



We believe the statement of the Theorem 1 may be extended and generalized into the following con- 
jecture: 

Conjecture 1. For n > 1, the set o/2™ x 2™ unitaries over the ring Z[^g,i] is equivalent to the set of 
unitaries implementable exactly as circuits with Clifford and T gates built using (n + 1) qubits, where 
the last qubit, an ancillary qubit, is set to the value |0) prior to the circuit computation, and is required 
to be returned in the state |0) at the end of it. 

Note, that the ancillary qubit may not be used if its use is not required. However, we next show that 
the requirement to include a single ancillary qubit is essential — if removed, the statement of Conjecture 
1 would have been false. The necessity of this condition is tantamount to the vast difference between 
single-qubit case and n-qubit case for n > 1. 

We wish to illustrate the necessity of the single ancilla with the use of controlled-T gate, defined as 
follows: 
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where us := e 27 ™/ 8 , the eighth root of unity. The determinant of this unitary is us. However, any Clifford 
gate as well as the T gate viewed as matrices over a set of two qubits have a determinant that is a power 
of the imaginary number i. Using the multiplicative property of the determinant we conclude that the 
circuits over the Clifford and T library may implement only those unitaries whose determinant is a power 
of the imaginary i. As such, the controlled-T, whose determinant equals us, cannot be implemented 
as a circuit with Clifford and T gates built using only two qubits. It is also impossible to implement 
the controlled-T up to global phase. The reason is that the only complex numbers of the form e 1 ^ 
that belong to the ring Z[^,i) are us k for integer k, as it is shown in Appendix A. Therefore, global 

phase can only change determinant by a multiplicative factor of us ik . However, as reported in [5] and 
illustrated in Figure 1, an implementation of the controlled-T over a set of three qubits, one of which 
is set to and returned in the state |0), exists. With the addition of an ancillary qubit, as described, the 
determinant argument fails, because one would now need to look at the determinant of a subsystem, 
that, unlike the whole system, may be manipulated in such a way as to allow the computation to 
happen. 

Theorem 1 provides an easy to verify criteria that reliably differentiates between unitaries imple- 
mentable in the H and T library and those requiring approximation. As an example, R x (^) and 
gates such as R z {^), where m > 3, popular in the construction of circuits for the Quantum Fourier 
Transform (QFT), cannot be implemented exactly and must be approximated. Thus, the error in 
approximations may be an unavoidable feature for certain quantum computations. Furthermore, Con- 
jecture 1, whose one inclusion is trivial — all Clifford and T circuits compute unitaries over the ring 
Z[^~, i}— implies that the QFT over more than three qubits may not be computed exactly as a circuit 
with Clifford and T gates, and must be approximated. 

Our second major result is an algorithm (Algorithm 1) that synthesizes a quantum single-qubit circuit 
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using gates H, Z:=T , P:=T 2 , and T in time O{n opt ), where n opt is the minimal number of gates 
required to implement a given unitary. Technically, the above complexity calculation assumes that the 
operations over the ring Z[^g , i] take a fixed finite amount of time. In terms of bit operations, however, 
this time is quadratic in n opt . Nevertheless, assuming ring operations take constant time, the efficiency 
has a surprising implication. In particular, it is easy to show that our algorithm is asymptotically 
optimal, in terms of both its speed and quality guarantees, among all algorithms (whether known 
or not) solving the problem of synthesis in the single-qubit case. Indeed, a natural lower bound to 
accomplish the task of synthesizing a unitary is n op t — the minimal time it takes to simply write down 
an optimal circuit assuming a certain algorithm somehow knows what it actually is. Our algorithm 
features the upper bound of O (n op t) matching the lower bound and implying asymptotic optimality. 
To state the above somewhat differently, the problem in approximating a unitary by a circuit is that 
of finding an approximating unitary with elements in the ring Z[^g,i], but not composing the circuit 
itself. We formally show H- and T-optimality of the circuits synthesized by Algorithm 1 in Appendix B. 

The T-optimality of circuit decompositions has been a topic of study of the recent paper [6] . We note 
that our algorithm guarantees both T- and H- optimality, whereas the one reported in [6] guarantees 
only T-optimality. Furthermore, our implementation allows a trade-off between the number of Phase 
and Pauli-Z gates (the number of other gates used, being Pauli-X and Pauli-Y, does not exceed a total 
of three). We shared our software implementation and circuits obtained from it to facilitate proper 
comparison of the two synthesis algorithms. 

In the recent literature, similar topics have also been studied in [5] who concentrated on finding depth- 
optimal multiple qubit quantum circuits in the Clifford and T library, [7] who developed a normal form 
for single-qubit quantum circuits using gates H, P, and T, and [1, 8] who considered improvements of 
the Solovay-Kitaev algorithm that are very relevant to our work. In fact, we employ the Solovay-Kitaev 
algorithm as a tool to find an approximating unitary that we can then synthesize using our algorithm 
for exact single-qubit unitary synthesis. 

3 Reducing unitary implementation to state preparation 

In this section we discuss the connection between state preparation and implementation of a unitary 
by a quantum circuit. In the next section, we prove the following result: 

Lemma 1. Any single-qubit state with entries in the ring Z[-^,i] can be prepared using only H and T 
gates given the initial state |0). 

We first establish why Lemma I implies that any single-qubit unitary with entries in the ring Z[^,t] 
can be implemented exactly using H and T gates. 

Observe that any single-qubit unitary can be written in the form 



where * denotes the complex conjugate. The determinant of the unitary is equal to e 1 ^ and belongs 
to the ring Z[^,i] when all entries of the unitary belong to the ring ZM=,«], It turns out that the 

only elements in the ring with the absolute value of 1 are uj k for integer k. We postpone the proof; it 
follows from techniques developed in Appendix A and discussed at the end of the appendix. For now, 
we conclude that the most general form of a unitary with entries in the ring is: 
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Table 1: First four elements of sequence (HT) n |0) 
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We next show how to find a circuit that implements any such unitary when we know a circuit that 

z 
w 



prepares its first column given the state |0). Suppose we have a circuit that prepares state 



This means that the first column of a unitary corresponding to the circuit is [ ] and there exists 
an integer k' such that the unitary is equal to: 

z —w*u k 
w z*ui k 



We can synthesize all possible unitaries with the first column (z, w) by multiplying the unitary above 
by a power of T from the right: 

z -w*w*' \ Tfc _ fc ' = / z -w*u k 

W Z*UJ k ' ) \ W Z*LU k 

This also shows that given a circuit for state preparation of length n we can always find a circuit for 
unitary implementation of length n + O(l) and vice versa. 



4 Sequence for state preparation 

We start with an example that illustrates the main ideas needed to prove Lemma 1. Next we formulate 
two results, Lemma 2 and Lemma 3, that the proof of Lemma 1 is based on. Afterwards, we describe 
the algorithm for decomposition of a unitary with entries in the ring Z[^,i] into a sequence of H and 
T gates. Finally, we prove Lemma 2. The proof of Lemma 3 is more involved and it is shown in Section 
5. 

Let us consider a sequence of states (HT) n |0). It is an infinite sequence, since in the Bloch sphere 
picture unitary HT corresponds to rotation over an angle that is an irrational fraction of tt. Table 1 
shows the first four elements of the sequence. 

There are two features in this example that are important. First is that the power of v2 in the 
denominator of the entries is the same. We prove that the power of the denominator is the same in the 
general case of a unit vector with entries in ring Z[^, i]. The second feature is that the power of y2 

in the denominator of \z n \ 2 increases by 1 after multiplication by HT. We show that in general, under 
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additional assumptions, multiplication by H (T fc ) cannot change the power of v2 in the denominator 
by more than 1. Importantly, under the same additional assumptions it is always possible to find such 
an integer k that the power increases or decreases by 1. 

We need to clarify what we mean by power of \/2 in the denominator, because, for example, it is 
possible to write 4g as . As such, it may seem that the power of y2 in the denominator of a 
number from the ring Z[^, i] is not well defined. To address this issue we consider the subring 

Z [ui] := {a + boj + cur + dui 3 , a, b,c,d G Z} 
of ring Z[^=, i] and the smallest denominator exponent. These definitions are also crucial for our proofs. 

It is natural to extend the notion of divisibility to elements of Z [ui] : x divides y when there exists x' 
from the ring Z [uj] such that xx' — y. Using the divisibility relation we can introduce the smallest 
denominator exponent and greatest dividing exponent. 

Definition 1. The smallest denominator exponent, sde(z,x), of base x G Z [to] with respect to z G 
Z[^,i] is the smallest integer value k such that zx k G Z[w]. If there is no such k, the smallest 
denominator exponent is infinite. 

For example, sde(l/4, y/2) — 4 and sde (2V2, v2) = —3. The smallest denominator exponent of base 
\/2 is finite for all elements of the ring Z[^g, i]. The greatest dividing exponent is closely connected to 
sde. 

Definition 2. The greatest dividing exponent, gde (z, x), of base x G Z [oj] with respect to z 6 Z [w] is 
the integer value A; such that x k divides z and x does not divide quotient pr. If no such k exists, the 
greatest dividing exponent is said to be infinite. 

For example, gde(z,w n ) = oo, since uj n divides any element of Z[w], and gde (0, a;) = oo. For any 
non-zero base i£Z[u], gde and sde are related via a simple formula: 

sde {^k ,x ) = k ~S de ( z ' x )- (!) 

This follows from the definitions of sde and gde. First, the assumption gde(z,a;) = ko implies 
sde(^,x) > k — kg. Second, the assumption sde(^,x) = fco implies gde(z,x) > k + ko. Since 
both inequalities need to be satisfied simultaneously, this implies the equality. 

We are now ready to introduce two results that describe the change of the sde as a result of the 
application H (T) to a state: 




Lemma 2. Let 

k: 



be a state with entries in*Z[^,i] and let sde (j z | 2 ) ^ 4. Then, for any integer 

-sde(|z| 2 ) <1. (2) 



1 < sde 



Z + WUI 



V2 



The next lemma states that for almost all unit vectors the difference in (2) achieves all possible values, 
when the power of uj is chosen appropriately. 

be a state with entries in Z[^,i] and let sde ^|z| 2 ^ > 4. Then, for each 



Lemma 3. Let 



w 



number s G {—1, 0, 1} there exists an integer k G {0, 1, 2, 3} such that: 



Z + WOJ 



sde I =— ) — sde (\ z \ 2 ^j 



G 



These lemmas are essential for showing how to find a sequence of gates that prepares a state with 
entries in the ring given the initial state |0). Now we sketch a proof of Lemma 1. Later, in 

Lemma 4, we show that for arbitrary u and v from the ring i] the equality \u\ 2 + \v\ 2 = 1 implies 

sde(|u| 2 ) = sde(|v| 2 ), when sde (j u | 2 ) > 1 an d s de (M 2 ) — Therefore, under assumptions of Lemma 
2, we may consider sde of a single entry in any given state. Lemma 3 implies that we can prepare any 

state using H and T gates if we can prepare any state ^ Z ^ such that sde(|z| 2 ) < 3. The set of states 

with sde(|z| 2 ) < 3 is finite and small. Therefore, we can exhaustively verify that all such states can be 
prepared using H and T gates given the initial state |0). In fact, we performed such verification using 
a breadth first search algorithm. 

The statement of Lemma 3 remains true if we replace the set {0, 1, 2, 3} by {0, —1, —2, —3}. Lemma 
3 results in Algorithm 1 for decomposition of a unitary matrix with entries in the ring ^[^, i] into a 

sequence of H and T gates. Its complexity is O (sde(|z| 2 )) , where z is an entry of the unitary. The idea 
behind the algorithm is as follows: given a 2 x 2 unitary U over the ring Z[^,i] and sde > 4, there 

is a value of k in {0, 1, 2, 3} such that the multiplication by H (T fc ) reduces the sde by 1. Thus, after 
n — 4 steps, we have expressed 

U = HT kl H ...HT k — 4 U', 

where any entry z' of U' has the property sde ^|z'| 2 ^ < 4. The number of such unitaries is small enough 
to handle the decomposition of U' via employing a breadth-first search algorithm. 

We use n opt (U) to define the smallest length of the circuit that implements U. 

Corollary 1. Algorithm 1 produces circuit of length 0(n opt (U)) and uses 0(n opt (U)) arithmetic op- 
erations. The number of bit operations it uses is 0{n 2 opt (U)). 

Proof. Lemma 4, proved later in this section, implies that the value of sde^|-| 2 ^ is the same for 
all entries of U when the sde of at least one entry is greater than 0. For such unitaries we define 
sde' ' (U) — sde ^|z'| 2 ^ , where z' is an entry of U . The remaining special case is unitaries of the form 
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We define sde 1 1 to be for all of them. Consider a set S op t,3 of optimal H and T circuits for unitaries 
with sde' ' < 3. This is a finite set and therefore we can define N 3 to be the maximal number of 
gates in a circuit from Sopt,3- If we have a circuit that is optimal and its length is greater than N3, 

|.|2 

the corresponding unitary must have sde 1 1 > 4. Consider now a unitary U with an optimal circuit 
of length n g (U) that is larger than A/3. As it is optimal, all its subsequences are optimal and it does 
not include H 2 . Let Nh,3 be the maximum of the number of Hadamard gates used by the circuits in 

S pt,3- An optimal circuit for U includes at most r l&i£} — ^2.J _|_ Nh,3 Hadamard gates and, by Lemma 

2, sde' ' of the resulting unitary is less or equal to Nh,3 + 3 + "g(^ We conclude that for all 

unitaries except a finite set: 



sde 1 ' 2 (U) < Nh i3 + 3 + 



n g (U) - N3 



2 

From the other side, the decomposition algorithm we described gives us the bound 

n g (U) < 7V 3 +4-sde M2 (£f). 
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We conclude that n g (U) and sde 1 ' 1 (U) are asymptotically equivalent. Therefore the algorithm's run- 
time is O (n g ([/)), because the algorithm performs sde' ' (U) — 4 steps. 



We note that to store U we need O ^sde' ' (U)^j bits and therefore the addition on each step of 

the algorithm requires O ^sde' ' (llfj bit operations. Therefore we use 0(n 2 opt (U)) bit operations in 
total. □ 

This proof illustrates the technique that we use in Appendix B to find a tighter connection between 
sde and the circuit implementation cost, in particular we prove that circuits produced by the algorithm 
are H- and T- optimal. 

Algorithm 1 Decomposition of a unitary matrix with entries in the ring Z[-^,i]. 

Input: Unitary U = ( Z °° Zm ) with entries in the ring Z[ -7=, il. 

S3 - table of all unitaries over the ring Z[-t=,z], such that sde of their entries is less than or equal to 
3. V 
Output: Sequence S out of H and T gates that implements U . 
Sout <- Empty 
s <- sded^ool 2 ) 
while s>3 do 
state-;— unfound 
for all k G {0,1,2,3} do 

while state = unfound do 

Zoo t0 P left entr y of HT~ k U 
if sde ( I ^00 1 2 ) = s — 1 then 
state = found 

add T k H to the end of S out 
s sde (\z' 00 \ 2 ) 
U <- HT- k U 
end if 
end while 
end for 
end while 

lookup sequence S rem for U in S3 
add S rem to the end of S ou t 
return S out 



We next prove Lemma 2. In Section 5 we use Lemma 2 to show that we can prove Lemma 3 by 
considering a large, but finite, number of different cases. We provide an algorithm (Algorithm 2) that 
verifies all these cases. 

We now proceed to the proof of Lemma 2. We use equation (1) connecting sde and gde together with 
the following properties of gde. For any base x G Z[cj]: 

gde (y + y', x) > min (gde (y, x) , gde (y', x)) (3) 
gde (yx k , x) = k + gde (y, x) (base extraction) (4) 
gde (y, x) < gde (y' , x) gde (y + y' , x) = gde (y, x) (absorption). (5) 

It is also helpful to note that gde (y, x) is invariant with respect to multiplication by ui and complex 
conjugation of both x and y. 
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All these properties follow directly from the definition of gde; the first three are briefly discussed in 
Appendix A. The condition gdc(y,:r) < gde(y',x) is necessary for the third property. For example, 
gde (72 + ^, ^2) ^ gde (^2, a/2). 



There are also important properties specific to base y2- We use shorthand gde (•) for gde (•, \/2\: 

gde(x)=gde(|x| 2 ,2) (6) 
< gde (ja;| 2 ) - 2gde (x) < 1 (7) 

gde (Re (V2xy*)) > \ (gde (|x| 2 ) + gde (|y| 2 ) ) (8) 

gde (|a;| 2 ) = gde (jy| 2 ) => gde (x) = gde (y) . (9) 

Proofs of these properties are not difficult but tedious; furthermore, for completeness they are included 
in Appendix A. We exemplify them here. In the second property, inequality (7), when x = ui the left 
inequality becomes equality and for u> + 1 the right one does. When we substitute x — u,y = uj + 1 
in the second to last property, inequality (8), it turns into = [^J , so the floor function r i— > [ r J is 
necessary. For the third property it is important that Re (\/2xy*^ is an element of Z [uj] when x, y itself 
belongs to the ring Z [uj]. In contrast, Re (xy*) is not always an element of Z [uj], in particular, when 

x = u, y = u + 1. In general, gde (x) = gde (y) does not imply gde (M 2 ) = gde (j?/| 2 ) • F° r instance, 

gde (u + 1) = gde (w), but \u + I| 2 = 2 + y/2 and \lu\ 2 = 1. 

In the proof of Lemma 2 we use x = z (\/2) ,y = w (\/2) ^ ' that are elements of Z [uj]. The 
next lemma shows an additional property that such x and y have. 

Lemma 4. Let z and w be elements of the ring Z[^,i] such that \z\ 2 + \w\ 2 = 1 and sde (z) > 1 or 
sde (w) > 1, then sde (z) = sde (w) and for elements x = z (V^Y ° * and y = w (V2) {W> of the ring 



Z [w] it holds that gde (Nl 2 ) — gde (\y\ 2 ^j 



< 1. 



Proof. Without loss of generality, suppose sde (z) > sde (w). Using the relation in equation (f ) between 
sde and gde, expressing z and w in terms of x and y, and substituting the result into equation \z\ 2 + 



\w\ = 1. we obtain 



/ \ 2(sdc(z) — sdc(iu)) / \ 2sdc(z) 

\y\ (y/2) =(V2) - \x\ 2 . 



Substituting z — xf (y/2) sd °^ into formula (f ) relating sde and gde, we obtain gde (x) = 0, and using 
one of the inequalities (7) connecting gde (M 2 ) an d gde (x) we conclude that gde (kl 2 ) < !• Similarly, 
gde (|y| 2 ) < 1< We use the absorption property (5) of gde (•) to write: 

/, ,9 / r-\ 2(sde(z)-sde(w))\ / \ 

gde^|y| 2 (72) J-gde(N 2 ). 

Equivalently, using the base extraction property (4): 

gde (|y| 2 ) + 2 (sde (z) - sde (w)) = gde (|x| 2 ) . 

Taking into account gde (M 2 ) ^ 1 an d gde (|2/| 2 ) ^ 1) it follows that sde (z) — sde (w). □ 
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In the proof of Lemma 2 we turn inequality (2) for difference of sde into an inequality for difference 
of gde (l x | 2 ) an d gde (|x + y\ 2 ^j . The following lemma shows a basic relation between these quantities 



that we will use. 



2 2 / / — \ 

Lemma 5. If x and y are elements of the ring Z [w] such that \x\ + \y\ = (v2) , then 



gde (\x + y\ 2 ^J 



> min m, 1 



i(gde(N 2 )+gde(H 2 )) 



Proof. The first step is to expand |a; + y| 2 as|a;| 2 + |y| 2 + \/2Re (\/2xy*). Next, we apply inequality 
(3) to the gde of the sum, and then the base extraction property (4) of the gde. We use equality 

gde (|z| 2 + |?/| 2 ^ = m to conclude that 

gde (\x + y\ 2 ^j > min (m, 1 + gde (i?e {^/2xy*^j^j . 

Finally, we use inequality (8) for gde (i?e (v^y*)) to derive the statement of the lemma. □ 
Now we collected all tools required to prove Lemma 2. 



Proof. Recall that, we are proving that for elements z and w of the ring Z[^, i] and any integer k it 
is true that: 

Z + WU! K 



-1 < sde 



— sde 



< 1, when sde \z\ > 4 



Using Lemma 4 we can define m = sde (z) = sde (wui k ^ and x — us k z(\/2^ and y = w (v^) ■ 
Using the relation (1) between gde and sde, and the base extraction property (4) of gde we rewrite the 
inequality we are trying to prove as: 



1 < gde 



(k + y| 2 ) -gde(|x| 2 ) 



< 3. 



V? 



It follows from Lemma 4 that gde (l x | 2 ) = gde (l?/| 2 ) < 1- Taking into account \x\ 
and applying the inequality proved in Lemma 5 to x and y we conclude that: 

gde (\x + y\ 2 ^j > min (2m, 1 + gde (|a:| 2 J^ . 

The condition m > 4 allows us to remove taking the minimum on the right hand side and replace it 
with 1 + gde (M 2 ) • This proves one of the two inequalities we are trying to show, 1 < gde (\x + y\ 2 ^j — 

gde (M 2 ) ■ To prove the second inequality, gde (|x + y\ 2 ^j — gde (M 2 ) < 3, we apply Lemma 5 to the 
pair of elements of the ring Z [ui], x + y and x — y. The conditions of the lemma are satisfied because 
\x + y\ 2 + \x — y\ 2 = y/2 2t * m+1 \ Therefore: 

gde (4 |x| 2 ) > min (2 (m + 1) , 1 + 1 (gde (\x + y| 2 ) + gde (\x - t/| 2 ) 

Using the base extraction property (4), we notice that gde (4|x| 2 ^ = 4 + gde (|:e| 2 ^. It follows from 

m > 4 that 2 (m + 1) > 4 + gde (M 2 ) ■ As such, we can again remove the minimization and simplify 
the inequality to: 



3 + gde (|af) > 1 (gde (|a; + y| 2 ) + gde (\x - y\ 2 ) 
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To finish the proof it suffices to show that gde \\x + y\j — gde (\x — y\j . We establish an upper 

bound for gde (\x + y\ 2 ^j and use the absorption property (5) of gde. Using non-negativity of gde and 
the definition of the floor function we get: 

2 (3 + gde (m 2 )) + 1 > gde (Jx + y| 2 ) . 
Since gde (M 2 ) < 1> gde (\x + y\ 2 ^j < 9. Observing that 2 (to + 1) > 9 we confirm that 
gde (\x - yf \ = gde (V2 2(m+1) -\x + y| 2 ) = gde (\x + y| 2 ) . 

□ 

To prove Lemma 3 it suffices to show that gde(|a; + uj k y\ 2 ) — gde(|x| 2 ) achieves all values in the set 
{1, 2, 3} as k varies over all values in the range from to 3. We can split this into two cases: gde(|x| 2 ) = 1 
and gde(|x| 2 ) = 0. We need to check if gde(|a; + uj k y\ ) belongs to {1, 2, 3} or {2, 3, 4}. Therefore, it is 
important to describe these conditions in terms of x and y. This is accomplished in the next Section. 



5 Quadratic forms and greatest dividing exponent 

We first clarify why it is enough to check a finite number of cases to prove Lemma 3. Recall how the 
lemma can be restated in terms of the elements of the ring Z [uj] . Next we illustrate why we can achieve 
a finite number of cases with a simple example using integer numbers Z. Then we show how this idea 
can be extended to the elements of the ring Z [uj] that are real (that is, with imaginary part equal to 
zero). Finally, in the proof of Lemma 3, we identify a set of cases that we need to check and provide 
an algorithm to perform it. 

As discussed at the end of the previous Section, to prove Lemma 3 one can consider elements x and y 
of the ring Z [uj] such that \x\ 2 + \y\ 2 = 2 m for to > 4. We know from Lemma 2 that there are three 
possibilities in each of the two cases: 

• when gde(|x| 2 ) = 0, gde(|x + uj k y\ 2 ) equals to 1, 2, or 3, 

• when gde(|x| 2 ) = 1, gde(|x + uj k y\ 2 ) equals 2,3, or 4. 

We want to show that each of these possibilities is achievable for a specific choice of k 6 {0, 1, 2, 3}. 

We illustrate the idea of the reduction to a finite number of cases with an example. Suppose we want 
to describe two classes of integer numbers: 

• integer a such that the gde (a 2 , 2) = 2, 

• integer a such that the gde (a 2 , 2) > 2. 

It is enough to know a 2 mod 2 3 to decide which class a belongs to. Therefore we can consider 8 residues 
amod2 3 and find the classes to which they belong. We extend this idea to the real elements of the 
ring Z [uj], being elements of the ring Z [uj] that are equal to their own real part. Afterwards we apply 
the result to a; + uj k y\ , that is a real element of Z [uj]. 

We note that the real elements of Z [uj] are of the form a + \[2b where a and b are themselves integer 
numbers. An important preliminary observation, that follows from the irrationality of \/2, is that for 
any integer number c 

gde(c) = 2gde(c,2). (10) 
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The next proposition gives a condition equivalent to gde (a + \/2&) = k, expressed in terms of gde (a, 2) 
and gde (b, 2): 

Proposition 1. Let a and b be integer numbers. There are two possibilities: 

• gde (a + -\/2&) is even if and only if gde (b, 2) > gde (a, 2); in this case, gde (a, 2) = gde (a + v2&) /2. 

• gde (a + v2&) is odd if and only if gde (b, 2) < gde (a, 2); m t/w's case, gde (6, 2) = (gde (a + \/2o) — l) 

Proof. Consider the case when gde (b, 2) < gde (a, 2). Observing, from equation (10), that gde (a) is 
always even, gde (a) > gde (v2&) , and by the absorption property (5) of gde we have gde (a + V2&) = 
gde(v / 2o). Using the base extraction property (4) of gde and the relation (10) between gde(-) and 
gde (-,2) for integers we obtain gde (a + \/2b) = 1 + 2gde(o, 2). The other case similarly implies 
gde (a + \/2b) — 2gde (a, 2). In terms of real elements of the ring Z [oj], this results in the following 
relations: 

Ai = {gde (6, 2) < gde (a, 2)} C B 1 = jgde (a + y/2bj is even j , 
A 2 = {gde (6, 2) > gde (a, 2)} <Z B 2 = jgde (a + v^b) is odd} . 

We note that each pair of sets {^1,^2} and {Bi,B 2 } defines a partition of real elements of the ring 
Z[cj]. This completes the proof since when for partitions {^.1,^2} and {Bi,P>2} of some set the 
inclusions ii C B 1 ,A 2 C B 2 imply Ai = Bi and A 2 = B 2 . □ 

To express \x + w k y\ 2 in the form a + \/26 in a concise way, we introduce two quadratic forms P (•) 
and Q (•) with the property: 



|x| 2 = P(.t) + y/2Q(x). (11) 

Given that cc, an element of Z[w], can be expressed in terms of the integer number coordinates as 
follows, x = xq + x\lu + x 2 oj 2 + 2:3a; 3 , we define the quadratic forms as: 

P(x):=x 2 + xl + x 2 2 + xl (12) 

Q (x) := x (xi - x 3 ) + x 2 (xi + x 3 ) . (13) 

Let us rewrite equality gde (\x + y| 2 ^ = 4 in terms of these quadratic forms and the gde of base 2. 
Using Proposition 1 we can write: 

gde (P(x + uj k y), 2) =2, 
gde (Q (s + 0J k y) , 2) > 2. 

Similar to the example given at the beginning of this section, we see that it suffices to know the values 
of the quadratic forms modulo 2 3 . To compute them, it suffices to know the values of the integer 
coefficients of x and y modulo 2 3 . This follows from the expression of the product wy in terms of the 
integer number coefficients: 

w (yi + y 2 u + y^u 2 + y^ 3 ) = -Vi + yiu + V2^ 2 + 

and from the following two observations: 

• integer number coefficients of the sum of two elements of the ring Z[-^,i] is the sum of their 



integer number coefficients, 
'], 

by the values modulo 2 3 of the integer number coefficients of x. 



• for any element of Z [w], x, the values of quadratic forms P (x) and Q (x) modulo 2 3 are defined 
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In summary, to check the second part of Lemma 2 we need to consider all possible values for the 
integer coefficients of x and y modulo 2 3 . There are two additional constraints on them. The first 
one is \x\ 2 + \y\ 2 — 2 m . Since we assumed m > 4, we can write necessary conditions to satisfy this 
constraint, in terms of the quadratic forms, as: 

P(x) = -P{y) (mod2 3 ) , 

Q{x) = -Q(y) (mod2 3 ) . 

The second constraint is gde (j x | 2 ) — gde (jy| 2 ) and gde (M 2 ) — 1- To check it, we use the same 
approach as in the example with gde ^|a; + y| 2 ^ = 4. 

We have now introduced the necessary notions required to prove Lemma 3. 

Proof. Our proof is an exhaustive verification, assisted by a computer search. We rewrite the statement 
of the lemma formally as follows: 



g. = / eZ[w] XZ[W] 



3m > 4 s.t. |x| + \y[ 
gde (a:) = gde (y) 



J 



,je{o,i}, 



for all (a;, y) € for all s G {1, 2, 3} there exists jfe G {0, 1, 2, 3} 



such that gde Mac • 



+ ui K y\ = s + j. 



(14) 



The sets Gj are infinite, so it is impossible to perform the check directly. As we illustrated with an 

example, equality gde (\x + ^y\ 2 ) =s +j depends only on the values of the integer coordinates of 

x and y modulo 2 3 . If the sets Gj were also defined in terms of the residues modulo 2 3 we could just 
check the lemma in terms of equivalence classes corresponding to different residuals. More precisely, 
the equivalence relation ~ we would use is: 



3 3 
p=0 p=0 



y p ui p <^4 for all p G {0, 1, 2, 3} : x v 



y p (mod 2 3 



To address the issue, we introduce sets Q 7 that include Gj as subsets: 



(x, y) G Z [to] x Z [w] 



gde (at) = gde (y) = j 
P(x)+P{y) =0(mod2 3 
Q(x) + Q(y) = 0(mod2 3 



,J€{0,1}. 



Therefore, in terms of the equivalence classes with respect to the above defined relation ~ the more 
general problem can be verified in a finite number of steps. However, the number of equivalence classes 
is large. This is why we employ a computer search that performs verification of all cases. To rewrite 
(14) into conditions in terms of the equivalence classes it suffices to replace Gj by Qj, replace x and y 
by their equivalence classes, and replace Z [ui] by the set of equivalence classes Z [u>] / ~. 

Algorithm 2 verifies Lemma 3. We use bar (e.g., x and y) to represent 4-dimensional vectors with 
entries in Zg, the ring of residues modulo 8. The definition of bilinear forms, multiplication by u> and 
the relations gde (Y| 2 ) = L 2,3,4 extend to x and y. We implemented Algorithm 2 and the result of 
its execution is true. This completes the proof. □ 
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Algorithm 2 Verification of Lemma 3. 

Output: Returns true if the statement of Lemma 3 is correct; otherwise, returns false. 

t> Here, Gj, ,& is the set of all residue vectors x such that gde(x) = j, P (x) — a,Q (x) = b. 
for all xi 7 x 2 ,x 3 , £4 £ {0, . . . , 7} do > generate possible residue vectors; 

X <- (xi,X2,X 3 ,X4,) 

j <- gde(|x| 2 ), a <- P (x) , b <- Q (x) 
if j G {0, 1} then 

add x to Gj t a,b 
end if 
end for 

for all j G {0, 1}, a x G {0, 7}, 6, G {0, 7} do 

a„ i a x mod 8, 6 a •< 6 X mod 8 > consider only those pairs that 

for all (x, y) G C.',„ ./. x Gj^ ay ,b v do > satisfy necessary conditions; 

for all d G {1,2,3} do 
state unfound 
for all k G {0,1,2,3} do 
t •<— x + cj fe |/ 

l — l 2 

if gdc(m ) = d + j then 

state found 
end if 
end for 

if state = unfound then 

return false 
end if 
end for 
end for 
end for 
return true 



6 Implementation 

Our CH — h implementation of Algorithm 1 is available online at http://code.google.eom/p/sqct/. 

7 Experimental results 

Table 2 summarizes the results of first obtaining an approximation of the given rotation matrix by a 
unitary over the ring Z[^g,i] using our implementation of the Solovay-Kitaev algorithm [1, 9], and 
then decomposing it into a circuit using the exact synthesis Algorithm 1 presented in this paper. We 
note that the implementation of our synthesis Algorithm 1 (runtimes found in the column tdecomp) 
is significantly faster than the implementation of the Solovay-Kitaev algorithm used to approximate 
the unitary (runtimes reported in the column t approx ). Furthermore, we were able to calculate ap- 
proximating circuits using 5 to 7 iterations of the Solovay-Kitaev algorithm followed by our synthesis 
algorithm. The total runtime to approximate and decompose unitaries ranged from approximately 
11 to 600 seconds, correspondingly, featuring best approximating errors on the order of 10 -50 , and 
circuits with up to millions of gates. Actual specifications of all circuits reported, as well as those 
synthesized but not explicitly included in the Table 2, due to space constraints, may be obtained from 
http : //qcirc . iqc . uwaterloo . ca/. 
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Figure 2: Circuit implementing the controlled-R z ((/>) gate. Upper qubit is the control, middle qubit 
is the target, and bottom qubit is the ancilla. 



On each step Algorithm 1 chooses from one of four small circuits: H, HT, HT 2 (=HP), and HT 3 to 
reduce sde. In practice, Pauli-Z gate is often easier to implement than either Phase or T gate. The 
cost of P^ and gates is usually the same as that of the respective P and T gates. We took this 
into account by writing circuit HT 3 using an equivalent and cheaper form HZT^. This significantly 
reduces the number of Phase gates required to implement a unitary. If there is no preference between 
the choice of P or Z, or P is preferred to Z, the HPT could be used in place of HT 3 . 

The RAM memory requirement during unitary approximation stage for our implementation is 2.1GB. 
In our experiments we used a single core of the Intel Core i7-2600 (3.40GHz) processor. 

The experimental results reported may be utilized in the construction of an approximate implementa- 
tion of the Quantum Fourier Transform (QFT). Figure 2 shows a circuit that employs the technique 
from [10] to implement the controlled-R z ((/>) using a single ancillary qubit. Note that |0) is the eigen- 
vector of R z (0) with the eigenvalue 1, thus this construction works correctly; moreover, no phase is 
introduced. Such controlled rotations are used in the standard implementation of the QFT [3], The 
advantage of such a circuit is that it introduces only a small additive constant overhead on the number 
of gates required to turn an uncontrolled rotation into a controlled rotation. Indeed, the number of T 
gates (those are more complex in the fault tolerant implementations than the Clifford gates [8]) required 
for the exact implementation of the Fredkin gate, being 7 [5], is small compared to the number of T 
gates required (the T-counts in the circuits we synthesize are provably minimal for the unitary being 
implemented) in the approximations of the individual single-qubit rotations, Table 2. In comparison 
to other approaches to constructing a controlled gate out of an uncontrolled gate, such as replacing 
each gate in the circuit by its controlled version, or using the decomposition provided by Lemma 5.1 
in [2] (achieving a two-qubit controlled gate using three single-qubit uncontrolled gates, that has the 
expected effect of roughly tripling the number of T gates), the proposed approach where a single ancilla 
is employed appears to be beneficial. 

We approximate the controlled- R z (0) by replacing R z (</>) with its approximation R' z (0). To evaluate 
the quality of such approximation we need to take into account that ancillary qubit is always initialized 
to |0) and that the controlled rotation is a part of a larger circuit. For this reason, we computed the 
completely bounded trace norm [11] of the difference of the channels corresponding to the controlled- 
^R z (4>) an d its approximation using \& R / ^ . Both channels map the space of two qubit density matrices 
into the space of three qubit density matrices, as one of the inputs is fixed. 

To compute the completely bounded trace norm we used the semidefinite program (SDP) described in 
[11]. Usual 64-bit machine precision was not enough for our purposes, so we used package SDPA-GMP 
[12] that employs The GNU Multiple Precision Arithmetic Library to solve SDP. Also, Mathematica 
was used to generate files with the description of SDP problems in the input format of SDPA-GMP. We 
found that for all unitaries in Table 2 the ratio of completely bounded trace norm of ^r z (0) — ^r/ (0) 
and trace distance between R z (^r) and its approximation belongs to the interval [2.82842, 2.98741]. In 
other words, the numerical value of the approximation error for the controlled rotations using the trace 
norm is roughly three times that for the corresponding single-qubit rotations using the trace distance 
(as per Table 2). 



15 



Table 2: Results of the approximation of K z ((p) = ^ j by our implementation. Column 

Ni contains the number of iterations used by the Solovay-Kitaev algorithm, n g — total number of gates 
(sum of the next four columns), tit — number of T and gates, nn — number of Hadamard gates, 
np — number of P and gates, npi — number of Pauli gates (note that the combined number of 
Pauli-X and Pauli-Y is never more than three for any of the circuits, so npi is dominated by Pauli- 
Z gates), dist — trace distance to approximation, t approx — time spent on the unitary approximation 
using the Solovay-Kitaev algorithm (in seconds), tdecomp — time spent on the decomposition of the 
approximating unitary into circuit, per Algorithm 1 (in seconds). Circuit specifications are available 
at http : / / qcirc . iqc . uwaterloo . ca/. 
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Figure 3: Comparison between ours and Dawson's implementations of the Solovay-Kitaev algorithm. 
Vertical axis shows log 10 of the number of gates and horizontal axis shows log 10 (1/e), where e is the 
projective trace distance between unitary and its approximation. 



We also performed a comparison to Dawson's implementation of the Solovay-Kitaev algorithm (see 
Figure 3) available at http://gitorious.org/quantum-compiler/. We ran Dawson's code using 
the gate library {H, T} with the maximal sequence length equal to 22 and tile width equal to 0.14. 
During this experiment, the memory usage was around 6 GB. For the purpose of the comparison 
gate counts for our implementation are also provided in the {H, T} library (Z=T 4 and P=T 2 ). We 
used projective trace distance to measure quality of approximation as it is the one used in Dawson's 
code. Because of the larger epsilon net used in our implementation we were able to achieve better 
approximation quality using fewer iterations of the Solovay-Kitaev algorithm. Usage of The GNU 
Multiple Precision Arithmetic Library allowed us to achieve precision up to 10 -50 while Dawson's code 
encounters convergence problem when precision reaches 10 -8 . The latter explains behaviour of the last 
set of points in the experimental results for Dawson's code reported in Figure 3. 

Two other experiments that we performed with Dawson's code are a resynthesis of the circuits generated 
by it using our exact decomposition algorithm. We first resynthesised circuits that were generated by 
Dawson's code using the {H,T} library. In most cases, the gate counts reduced by about 10-20% 
(our resulting circuits were further decomposed such as to use the {H, T} gate library). In the other 
experiment we used {H, T, P, Z} gate library with Dawson's implementation. In this case, we were 
able to run Dawson's code with the sequences of length 9 only, and it used 6 GB of memory. The gate 
counts, using our algorithm, decreased by about 40-60%. 
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8 Conclusion 



In this paper, we studied quantum circuits over the Clifford and T library. We proved that in the singlc- 
qubit case the set of unitarics over the ring Z[^,i] is equivalent to the set of unitaries computable 
by the Clifford and T circuits. We generalized this statement to conjecture that in the n-qubit case 
the sets of unitaries computable by the Clifford and T circuits and those over the ring Z[^,i] are 
equivalent, as long as a single ancillary qubit residing in the state |0) is supplied. While we did not 
prove this conjecture, we showed the necessity of ancilla. 

We have also presented a single-qubit synthesis algorithm that uses Pauli, H, P, and T gates. Our 
algorithm is asymptotically optimal in both its performance guarantee, and its complexity. The algo- 
rithm generates circuits with a provably minimal number of Hadamard and T gates. Furthermore, our 
experiments suggest that the P-counts may also be minimal. The total number of times Pauli-X and 
Pauli-Y gates are used in any given circuit generated by the algorithm is limited to at most three. As 
such, our algorithm is likely optimal (up to, possibly, a small additive constant on some of the gate 
counts, that, in turn, may be corrected by re-synthesizing the lookup table using a different /suitable 
circuit cost metric) in all parameters, except, possibly, Pauli-Z count. 
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Appendix A 

Here we prove properties of the greatest dividing exponent that was defined and used in Section 4. We 
first discuss the base extraction property (4) of gde and then proceed to the proof of special properties 
of gde (•, The base extraction property simplifies proofs of all statements related to gde (•, %/2~). 
Proposition 2 (Base extraction property). If x,y £ Z[w], then for any non negative integer number 
k 

gde (yx k ,x) = k + gde (y, x) . 

Proof. Follows directly from the definition of gde. □ 

The base extraction property together with non-negativity of gde provide a simple formula to lower 
bound the value of gde: if x divides y then gde (y, x) > k. Inequality for gde of a sum (3) follows 
directly from this— x ™H& de (v,*UHv',x)) divides y + y i The prQof of absorption pr0 p ert y (5) follows 
easily, as well. 

Now we prove properties of gde specific to base \Jl. Instead of proving them for all elements of Z [ui] 
it suffices to prove them for elements of Z [oj] that are not divisible by We illustrate this with an 

example gde (a;, Vz) = gde ^|x| 2 , 2^ . We can always write x — x' (V%) S ' • By the definition of gde, 

\/2 does not divide x' . By substituting the expression for x into gde ^|a:| 2 ,2^ and then using the base 
extraction property we get: 

gde (\x\ 2 , 2) = gde (|xf , 2) + gde (x, V^j . 
Therefore, it suffices to show that gde 

(V| 2 , 2 ) = wh en y/2 does not divide x' , or, equivalently, when 

gde (x r ) = 0. 
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The quadratic forms defined in Section 5 will be a useful tool for later proofs. Bilinear forms that 
generalize them are important for the proof of relation for gde (Re (xy*)). Effectively, we only need 
the values of mentioned forms modulo 2. For this reason, we also introduce forms that are equivalent 
modulo 2 and more convenient for the proofs. 

We define function F (•, •) for [ui] as follows: 

F (x, y) := x Q y + x x y\ + x 2 y 2 + x 3 y 3 . 

Note that the following equality holds, and provides some intuition behind the choice to introduce 
f(v): 



i (xy*) = F (x, y) + ^=F {y/2x, yj 



Using formula v2 = w - u 3 we can rewrite multiplication by y/2 as a linear operator: 

V2x = \2{x) : x + xiuj + x 2 lo 2 + x 3 lo 3 m> (xi — x 3 ) + (x + x 2 ) ui + (xi + x 3 ) ui 1 + (x 2 — x ) uj 3 . (15) 
In particular, it is easy to verify that: 

F (V2x, yj = (xi - x 3 ) y + (x + x 2 ) yi + (xi + x 3 ) y 2 + (x 2 ~ x ) y 3 , 

and, substituting y = x, 

F W2x, xj = 2 (xi - x 3 ) x 2 + 2 (xi + x 3 ) x = 2Q (x) , 

which corresponds to the earlier definition shown in equation (13). The definition of F •) written 
for x = y results in an earlier definition (12). This shows how F(-,-) generalizes and ties together 
previously introduced P(-) and Q(-). 

Furthermore, in modulo 2 arithmetic the following expressions hold true: 

P(x) = (xi +x 3 ) + (x + x 2 ) (mod 2) (16) 

Q (x) ee (xi + x 3 ) (x + x 2 ) (mod 2) (17) 

F (V2x, y) ee (xi + x 3 ) (y + y 2 ) + (x Q + x 2 ) ( Vl + y 3 ) (mod 2) . (18) 

It is easy to verify these equations by expanding the left and right hand sides. 

The next proposition shows how we use equivalent quadratic and bilinear forms. 
Proposition 3. If gde (x) = there are only two alternatives: 

• P (x) is even and Q (x) is odd, 

• P (x) is odd and Q (x) is even. 

Proof. The equality gde (x) = implies that 2 does not divide \/2x. Using expression (15) for y/2x in 
terms of integer coefficients we conclude that at least one of the four numbers x\ ± x' 3 , x' ± x' 2 must 
be odd. Suppose that x\ + x 3 odd. Using formulas (16,17) we conclude that the values of P(x) and 
Q(x) must have different parity. The remaining three cases are similar. □ 

An immediate corollary is: gde (x) = implies gde^|x| 2 ,2^ = 0. To show this it suffices to use 
expression (11) for |x| 2 in terms of quadratic forms. 
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We can also conclude that \/2 divides x if and only if 2 divides |x| 2 . Sufficiency follows from the 
definition of gde. To prove that 2 divides \x\ implies \[2 divides x, we assume that 2 divides |x| and 
\/2 does not divide x, which leads to a contradiction. This also results in the inequality gde fl 3 *!^ ^ 1 
when gde (x) = 0. 

We use the next two propositions to prove the inequality for Re (y/2xy*}. 
Proposition 4. Let gde (x) = 0: 

• if \[2 divides \x\ 2 then P (x) is even and Q (x) is odd, 

• if \[2 does not divide \x\ 2 then P (x) is odd and Q (x) is even. 

Proof. As discussed, the previous proposition implies that \[2 divides y if and only if 2 divides \y\ 2 . 
We apply this to . By expressing |a;| in terms of quadratic forms we get: 

\x\ 4 = P {xf + 2Q (x) 2 + 2V2P (x) Q (x) . 

We see that 2 divides if and only if 2 divides P(x) 2 , or, equivalently, \[2 divides |ir| 2 if and only 
if P (x) even. Using the previous proposition again, this time for x, we obtain the required result. □ 

Proposition 5. Let gde (x) = and gde (z/) =0. // \[2 divides |x| 2 and y/2 divides \y\ 2 then \/2 
divides Re {^/2xy*} . 

Proof. By the previous proposition, \[2 divides |x| 2 and \[2 divides |y| 2 implies that Q (x) and Q (y) are 
odd. Formula (17) implies that in terms of the integer number coefficients of x and y integer numbers 
x i + x 3i x o + x 2, yi + y3, Vo + 2/2, are all odd. Expressing Re (y2a;y*) in terms of F (•, •), 

Re (%/2xy*) = V2F (x, y) + F Ufex, y) , 

and using expression (18), we conclude that 2 divides F (V^x, y) ; therefore y/2 divides Re {y/2xy*). □ 

Now we show gde (Re (y/2xy*)} > \ ^gde fl^l 2 ) + gde (ly| 2 ))J ■ ^ s we discussed in the beginning, 
we can assume gde (x) = and gde (y) = without loss of generality. This implies gde fW 2 ) — 1 an( i 
gde (|y| 2 ) < 1- The expression \ ^gde (kl 2 ) + gde (ly| 2 ))J can 0T ^f ^ e e q ua l to or 1. The second 
one is only possible when gde (M 2 ) = 1 an d gde (l?/| 2 ) = lj m which case the previous proposition 
implies gde (Re (v^y*)) > 1. In the first case inequality is true because of the non- negativity of gde. 

We can also use quadratic forms to describe all numbers z in the ring Z[^,i] such that |z| 2 = 1. 
Seeking a contradiction, suppose sde (z) > 1. We can always write z = * k where k = sde (z) and 

(V2J 

gde (x) = 0. From the other side \x\ 2 = P (x) + y/2Q (x) = 2 k . Thus we have a contradiction with the 
statement of Proposition 3. We conclude that z is an element of Z [w]. Therefore we can write z in 
terms of its integer number coordinates, z — zq + z\lu + Z2U1 2 + z^lu 3 . Equality \z\ — 1 implies that 
F (z,z) = Zq + z\ + z\ + z\ = 1. Taking into account that zj are integer numbers we conclude that 
ze{id fc ,fc = 0,...,7}. 



Appendix B 

Here we prove that Algorithm 1 produces circuits with the minimal number of Hadamard and T gates 
over the gate library Q consisting of Hadamard, T, T 1 ', P, P* , and Pauli-X, Y, and Z gates. We say that 
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a circuit implements a unitary U if the unitary corresponding to the circuit is equal to U up to global 
phase. We define integer-valued quantities h(U) and t(U) as the minimal number of Hadamard and 
T gates over all circuits implementing U. We call a circuit H- or T-optimal if it contains the minimal 
number of H or T gates, correspondingly. 

Theorem 2 . Let U be a 2x2 unitary over the ring Z [ ^ , i] with a matrix entry z such that sde ( | z \ 2 ) > 4. 
Algorithm 1 produces a circuit that implements U over Q with: 

1. the minimal number of Hadamard gates and h(U) = sde(|z| 2 ) — 1, and 

2. the minimal number of T gates and t{U) = h(U) — 1 + (I mod 2) + (j mod 2), where I and j are 
chosen such that h(HT l UT j H) = h(U) + 2. 

Proof. 1: H-optimality. Using brute force, we explicitly verified that the set of H-optimal circuits with 
precisely 3 Hadamard gates is equal to the set of all unitaries over the ring Z[^, i] with sde(|z| 2 ) = 4. 

Suppose we have a unitary U with sde(|z| 2 ) = n > 4. With the help of Algorithm 1 we can reduce it 
to a unitary with sde(|z| 2 ) = 4 while using n — 4 Hadamard gates to accomplish this. As such, there 
exists a circuit with n — 1 Hadamard gates that implements U. 

Now consider an H-optimal circuit C that implements U . Using brute force, we established that if C 
has less than 3 Hadamard gates, then sde(|z| 2 ) is less than 4. Suppose C contains m > 3 Hadamard 
gates. Its prefix, containing 3 Hadamard gates, must also be H-optimal, and therefore sde(|z| 2 ) of the 
corresponding unitary is 4. Now, using the inequality from Lemma 2, we conclude that sde(|z| ) of the 
unitary corresponding to C is less than m + 1. This implies n < m + 1. Since we already know that 
m < n — 1, we may conclude that m = n — 1 and m is the number of Hadamard gates in the circuit 
produced by Algorithm 1 in combination with the brute force step. 

2: T-optimality. To prove T-optimality we introduce a normal form for circuits over Q. We call 
a circuit HT-normal if there is precisely one T gate between every two H gates and, symmetrically, 
precisely one H gate between every two T gates. It is not difficult to modify Algorithm 1 to produce a 
circuit in HT-normal form while preserving its H-optimality. To accomplish that, first, recall that HT 3 
= HZT' and that all circuits generated during the brute force stage are both H-optimal and in the 
HT-normal form. Second, any circuit produced by the algorithm is H-optimal and does not contain a 
non- H-optimal (up to global phase) subcircuit HT 2 H = HPH = wPHP. 

We will show that any H-optimal circuit in HT-normal form is also T-optimal. We start with a special 
case of HT-normal circuits — those that begin and end with the Hadamard gate, in other words, those 
that can be written as HSiH. . .HSfcH, and are H-optimal. Let U be a unitary corresponding to this 
circuit. Due to HT-normality, each Si contains exactly one T gate, the number of T gates in the circuit 
is k, and h(U) = k + 1; therefore, t(U) < h(U) — 1. To prove that t(U) — h(U) — 1, it suffices to show 
that t(U) > h(U) — 1. Let us write a T-optimal circuit for U as CoTCiT. . TCfe. Each subcircuit Cfc 
implements a unitary from the Clifford group. Each unitary from the single-qubit Clifford group can 
be implemented using at most one H gate (recall, that we are concerned with the implementations up 
to global phase), therefore h(U) < t(U) + 1, as required. 

In the general case, consider a circuit obtained by Algorithm f and implementing a unitary V with 
h(V) > 3 that is H-optimal and written in HT-normal form and show that it is T-optimal. We can write 
it as SoHSiH . . . HSfcHSfc+i. By Lemma 3 we can always find such I and j that C :=HTSoHSiH. . .HSfcH 
Sfc+iT'H is also an H-optimal circuit. Indeed, according to Lemma 3, using the connection between 
sde(-) and h(-) described in the first part of the proof, given h(V) = k + 1 we can always find I such 
that h(BT l V) = k + 2. From the other side, circuit HT'S HSiH. . .HS fc HS fc+ i contains k + 2 Hadamard 
gates and therefore is H-optimal. We repeat the same procedure to find j. 

Considering the different possible values of I and j allows to complete the proof of the Theorem. This is 
somewhat tedious, and we illustrate how to handle different cases with a representative example of I = 3 
and j = 2. In such a case, we can rewrite circuit C as C =HTPS HSiH. . .HSfcHSfe + iPH. We conclude 
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that So must have zero T gates and Sk+i must have one T gate. Otherwise subcircuits HTPSoH and 
HSfc + iPH will not be H-optimal. As such, we reduced the problem to the special case considered above, 
therefore circuit C is T-optimal and S0HS1H. . .HSfeHSfc+i is T-optimal as its subcircuit. In the general 
case, the following formula may be developed t (V) = h(U) — 1 + (I mod 2) + (j mod 2). □ 
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